Learning to Collaborate from Delayed Rewards in Foraging Like Environments
نویسندگان
چکیده
Machine learning techniques are usually used in coordination problems and in competitive games but not in collaborative ones. Collaboration and coordination are different, while in coordination the task can not be concluded by a unique agent, in collaboration it can be solved by one agent or by a team, but the use of several agents has to be re ected in the performance of the system. In this work, authors propose the use of in uence value reinforcement learning (IVRL) in collaborative problems and test it into a foraging game. In early work authors show experimentally that, in coordination problems, the IVRL paradigm performs better than the traditional paradigms (independent learning and joint action learning). Thus, in this paper, authors compare their new paradigm (IVRL) with the traditional ones in order to establish if reinforcement learning is well suited to be used in collaboration problems, and shows that the proposed paradigm performs better than the traditional ones.
منابع مشابه
Adaptive intertemporal preferences in foraging-style environments
Decision makers often face choices between smaller more immediate rewards and larger more delayed rewards. For example, when foraging for food, animals must choose between actions that have varying costs (e.g., effort, duration, energy expenditure) and varying benefits (e.g., amount of food intake). The combination of these costs and benefits determine what optimal behavior is. In the present s...
متن کاملShort-term gains, long-term pains: how cues about state aid learning in dynamic environments.
Successful investors seeking returns, animals foraging for food, and pilots controlling aircraft all must take into account how their current decisions will impact their future standing. One challenge facing decision makers is that options that appear attractive in the short-term may not turn out best in the long run. In this paper, we explore human learning in a dynamic decision making task wh...
متن کاملThe application of temporal difference learning in optimal diet models.
An experience-based aversive learning model of foraging behaviour in uncertain environments is presented. We use Q-learning as a model-free implementation of Temporal difference learning motivated by growing evidence for neural correlates in natural reinforcement settings. The predator has the choice of including an aposematic prey in its diet or to forage on alternative food sources. We show h...
متن کاملComparison of Reinforcement and Supervised Learning Methods in Farmer-Pest Problem with Delayed Rewards
In this paper we propose a method based on the time-window idea which allows agents to generate their strategy using supervised learning algorithms in environments with delayed rewards. It is universal and can be used in various environments. Learning speed of the proposed method and reinforcement learning algorithm are compared in a FarmerPest problem with delayed rewards. Farmer-Pest problem ...
متن کاملCopy-when-uncertain: bumblebees rely on social information when rewards are highly variable
To understand the relative benefits of social and personal information use in foraging decisions, we developed an agent-based model of social learning that predicts social information should be more adaptive where resources are highly variable and personal information where resources vary little. We tested our predictions with bumblebees and found that foragers relied more on social information...
متن کامل